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Improved Image Conversion and Encoding Techniques 

The present invention is generally directed towards stereoscopic image 
synthesis and more particularly toward an improved method of converting two 
dimensional (2D) images for further encoding, transmission and encoding for 
5 the purpose of stereoscopic image display. The Applicants have previously 
described in PCT/AU96/00820, a method of producing left and right eye images 
for a stereoscopic display from an original 2D image including the steps of 

a. identifying at least one object within an original image 

b. outlining each object 

1 0 c. defining a depth characteristic for each object 

d. respectively displacing selected areas of each object by a determined 
amount in a lateral direction as a function of the depth characteristic of 
each object, to form two stretched images for viewing by the left and right 
eyes of the viewer. 

1 5 These steps can be individually and collectively referred to as Dynamic 

Depth Cuing or DDC. 

The present invention further improves the operation of the Applicant's 
earlier system. 

The present invention provides in one aspect a method of encoding a 
20 depth map including; 

allocating an object number to an object; 
allocating the object with a depth; and 
defining the object outline. 

The object outline may be defined by a series of co-ordinates, and/or 
25 curves. 

In another aspect the present invention provides for the use of bezier 
curves to generate an outline of an object in a 2D to 3D conversion process. 

In yet a further aspect the present invention provides for the use of curves 
to define an object in a 2D to 3D conversion process. 
30 In another aspect the present invention provides a method of 

transmission of depth map information wherein the information is included in the 
Vertical Blanking Interval or MPEG data stream 



} 
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In still a further aspect the present invention provides for the use of 
generic libraries to assist in the 2D to 3D conversion process. 

To provide a better understanding of the present invention, reference is 
made to the accompanying drawings which illustrate a preferred embodiment of 
5 the present invention. 

In the drawings: 

Figures 1 and 2 shows a preferred method of conversion from depth map 
data to distortion grid. 

Figures 3, 4 and 5 show various techniques of determining the outline of 
1 0 an object as disclosed by the present invention. 

Figure 6 shows a sample distortion grid. 
Object identification 

Objects in the 2D image to be converted may be identified by visual 
inspection using a human operator. The operator will typically tag each object, 
15 or group of objects, in the image using a computer mouse, light pen, stylus or 
other device and assign a unique number to the object. The number may be 
manually created by the operator or automatically generated in a particular 
sequence by a computer. 

Objects may also be identified fully automatically using a computer or 
20 semi-automatically whereby an operator assists the computer to determine the 
location of an object(s). 

To automatically identify an object the computer will use such 
characteristics as object size, colour, speed of motion, shading, texture, 
brightness, obscuration, focus as well as differences between previous and 
25 current and future images. Neural networks and expert systems may also be 
used to assist with identifying objects. 

In semi-automatic object identification an operator may provide 
assistance to the computer by advising the computer as to the nature of the 
image where objects may be found. For example the operator may advise the 
30 computer that the scene is of the generic format "News Reader" in which case 
the computer will attempt to locate the head and shoulders of the news reader, 
desk and background etc. The operator may choose from a menu of possible 
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generic scenes. The operator may manually override and/or correct and adjust 
any object selection made by the computer. The computer program may learn 
from these corrections, using neural networks or expert systems for example, so 
as to continually improve the accuracy of object identification and numbering. 
5 Once an object has been identified and numbered the object may then 

be tracked either manually, automatically or semi-automatically as it moves 
within the image over successive frames. 

An operator may also use object identification information produced by 
another operator either working on the same sequence or from prior conversion 
10 of similar scenes. 
Object Outlining 

The outline of an object or objects may be determined either manually, 
automatically or semi-automatically. 

In manual outlining the operator may trace the outline of the object or 

15 objects using a computer mouse, light pen, stylus or other device. The operator 
may select the outline of the object on a pixel by pixel basis, use straight line or 
curve approximations, bezier curves or best fit from a library of curves or generic 
shapes. The operator may also choose from a library of generic shapes which 
may already be of approximately the correct shape and scale or adjust the 

20 shape dynamically to fit. For example the operator may wish to select the outline 
of a man in which case the generic outline of a man may be retrieved from the 
library and adjusted accordingly, either manually, semi-automatically or 
automatically. 

In automatic outlining the computer may use such characteristics as size, 
25 colour, speed of motion, shading, brightness, obscuration, differences between 
previous and current and future images. Neural networks and expert systems 
may also be used to determine the outline of objects. 

In semi-automatic outlining an operator may provide assistance to the 
computer by advising the computer as to the nature of the image where objects 
30 may be found. For example the operator may advise the computer that the 
scene is of the generic format "News Reader" in which case the computer will 
attempt to locate the head and shoulders of the news reader, desk and 
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background etc. The operator may choose from a menu of possible generic 
objects . The operator may manually override and/or correct and adjust any 
object outlining made by the computer. The computer program may learn from 
these corrections, using neural networks or expert systems for example, so as 
5 to continually improve the accuracy of outlining. 

Once an object has been outlined the object may then be tracked either 
manually, automatically or semi-automatically as it moves within the image over 
successive frames. 

An operator may also use object outline information produced by another 
10 operator either working on the same sequence or from prior conversion of 
similar scenes. The operator may also choose from a library of predefined 
outlines and either manually, semi-automatically or automatically adjust the 
library outline to fit the selected object. The library may be indexed by individual 
outlines eg News Reader or based upon a particular family of objects eg Horse 
1 5 Race, Evening News etc. 
Defining depth 

The depth of an object or objects may be determined either manually, 
automatically or semi-automatically. The depth of the objects may be assigned 
using any alpha-numeric, visual, audible or tactile information. In the preferred 

20 embodiment the depth of the object is indicated by shading the object with a 
particular colour. Typically this will be white for objects that are to appear, once 
converted, at a 3D position closest to the viewer and black for objects that are 
at the furthest 3D distance from the viewer. Obviously this convention may be 
altered, eg reversed or colours used to indicate relative or absolute depth. 

25 In another embodiment the depth of the object may be assigned a 

numerical value. This value may be positive or negative, in a linear or non- 
linear series and contain single or multiple digits. In a preferred embodiment 
this value will range from 0 to 255, to enable the value to be encoded in a single 
byte, where 255 represents objects that are to appear, once converted, at a 

30 3D position closest to the viewer and 0 for objects that are at the furthest 3D 
distance from the viewer. Obviously this convention may be altered, eg 
reversed or another range used. 
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In manual depth definition the operator may assign the depth of the 
object or objects using a computer mouse, light pen, stylus or other device. The 
operator may assign the depth of the object by placing the pointing device 
within the object outline and entering a depth value. The depth may be entered 
5 by the operator as a numeric, alphanumeric or graphical value and may be 
assigned by the operator or automatically assigned by the computer from a 
predetermined range of allowable values. The operator may also select the 
object depth from a library or menu of allowable depths. 

The operator may also assign a range of depths within an object or a 

1 0 depth range that varies with time, object location or motion or any combination 
of these factors; For example the object may be a table that has its closest edge 
towards the viewer and its farthest edge away from the viewer. When converted 
into 3D the apparent depth of the table must vary along its length. In order to 
achieve this the operator may divide the table up into a number of segments and 

1 5 assign each segment an individual depth. Alternatively the operator may assign 
a continuously variable depth within the object by shading the object such that 
the amount of shading represents the depth at that particular position of the 
table. In this example a light shading could represent a close object and dark 
shading a distant object. For the example of the table, the closest edge would 

20 be shaded lightly, with the shading getting progressively darker, until the furthest 
edge is reached. 

The variation of depth within an object may be linear or non-linear and 
may vary with time, object location or motion or any combination of these factors. 

The variation of depth within an object may be in the form of a ramp. A 
25 linear ramp would have a start point (A) and an end point (B). The colour at 
point A and B is defined. A gradient from Point A to Point B is applied on the 
perpendicular line. 

A Radial Ramp defines a similar ramp to a linear ramp although it uses 
the distance from a center point (A) to a radius (B). 
30 A simple extension to the Radial Ramp would be to taper the outside rim, 

or to allow a variable sized center point. 
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A Linear Extension is the distance from a line segment as opposed to the 
distance from the perpendicular. In this example the colour is defined for the 
line segment, and the colour for the "outside". The colour along the line 
segment is defined, and the colour tapers out to the "outside" colour. 
5 A variety of ramps can be easily encoded. Ramps may also be based on 

more complex curves, equations, variable transparency etc. 

In another example an object may move from the front of the image to the 
rear over a period of frames. The operator could assign a depth for the object in 
the first frame and depth of the object in the last or subsequent scene. The 

1 0 computer may them interpolate the depth of the object over successive frames 
in a linear or other predetermined manner. This process may also be fully 
automated whereby a computer assigns the variation in object depth based 
upon the change in size of an object as it moves over time. 

In automatic depth defining the computer may use such characteristics 

15 as size, colour, speed of motion, shading, brightness, obscuration, focus, 
differences between previous and current and future images. Neural networks 
and expert systems may also be used to determine the depth of objects. 

In semi-automatic depth defining an operator may provide assistance to 
the computer by advising the computer as to the nature of the image where 

20 depths are to be assigned. For example the operator may advise the computer 
that the scene is of the generic format "News Reader" in which case the 
computer will attempt to locate the head and shoulders of the news reader, desk 
and background etc and place these in a logical depth sequence. The operator 
may choose from a menu of possible generic objects and depths . The operator 

25 may manually override and/or correct and adjust any object depth decision 
made by the computer. The computer program may learn from these 
corrections, using neural networks or expert systems for example, so as to 
continually improve the accuracy of depth assigning. 

Once an object has been assigned a specific depth the object may then 

30 be tracked either manually, automatically or semi-automatically as it moves 
within the image over successive frames. 
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An operator may also use depth definitions produced by another operator 
either working on the same sequence or from prior conversion of similar scenes. 
Depth Maps 

The process of detecting objects, determining their outline and assigning 
5 depths we will refer to as the creation of Depth Maps. In a preferred 
embodiment, the depth maps would consist of grayscale images of 
256x256x8bit resolution to enable the objects within the associated 2D image to 
be defined at one (or more for ramped depths) of 256 individual depths. 
Encoding of depth maps 

10 The depth maps may be encoded in a number of ways. In a preferred 

embodiment the object number, depth and object outline are encoded as 
follows. Consider the outline of a person shown in figure 3. The person is 
allocated object number 1 with depth 20. The outline of the object has been 
determined as previously explained and at specific x,y locations, typically 

15 where a change in direction of the object outline takes place, a particular mark is 
made. This mark may be an alphanumeric character, a shape, colour or other 
form of visual indication. Each of these marks will have a specific x, y location. 
In the preferred embodiment this will be within the range 0 to 255. Between 
each pair of x,y locations will exist a curve. Each curve may be determined by 

20 selection from a library of all possible curve shapes. In the preferred 
embodiment each curve will be given a value typically within the range -128 to 
+128 to enable the curve to be defined using one byte. Curves that progress 
clockwise from x,y location to the next x,y location may be assigned positive 
values whilst those that progress anticlockwise may be assigned negative 

25 values. Other assignments may be applied. 

The depth map data for this object may therefore be defined as follows: 

<object numberxobject depthxx1,y1, curvel , x2,y2, curve2, x1,y1 > 

The object depth information contains the data required to generate the 
depth of the current object. As previously mentioned, this depth data may be a 
30 single value, a ramp (linear, radial or other), or other method of describing the 
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depth, of a single object. The following methods demonstrate possible means of 
encoding the depth data of a single object. 

The depth data may be encoded as follows for a single depth value: 

<depth flag 1 xdepth value> 

5 The depth data may be encoded as follows for an object with a linear 

ramp as its depth value: 

<depth flag 2><x1,y1 ,depth valuel ,x2,y2,depth value2> 

where the depth of the object varies linearly from value 1 at x1 f y1 to value 2 at 
x2,y2. 

10 The depth data may be encoded as follows for an object with a non-linear 

ramp as its depth value: 

<depth flag 3>< x1,y1 .depth valuel ,x2,y2,depth value2,gamma> 

where gamma is a value that describes the non-linear variation of depth over 
the range between x1 ,y1 and x2,y2. 
15 The depth data may be encoded as follows for an object with a radial 

ramp as its depth value: 

<depth flag 4><x1 ,y1 , depth valuel ,radius,depth value2> 

where the object has depth value 1 at x1,y1 and the depth varies linearly or 
otherwise to a value of depth value 2 at all points radius pixels away from x1 ,y1 
20 It will be understood that once an objects depth map has been 

transmitted it is not necessary to transmit the depth map again until the object 
moves or changes shape. Should only the objects position change then the 
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new position of the object may be transmitted by assigning an offset to the 
object's position as follows: 

<object numberxxoffset, yoffset> 

similarly should the objects depth change and not its position or size the 
5 following may be transmitted 

<object number><depth> 

It will also be understood that adjacent touching objects will share x,y 
coordinates and that redundancy therefore exists in the x,y coordinates that 
need to be transmitted to uniquely define the depth maps of every object in the 
1 0 scene. 

In order to minimise the amount of additional data required to be 
transmitted or stored it is desirable to compress the data comprising the depth 
maps. The compression can use any form of data compression algorithm and 
many will be known to those skilled in the art. 

15 It will be appreciated that the depth map information may be included in 

the Vertical Blanking Interval (VBI) of an analogue television signal or MPEG or 
other digital transmission stream of a digital television signal as has previously 
been disclosed for distortion mesh transmission. 

It is obvious to anyone skilled in the art how the data may be included in 

20 the VBI and the MPEG data stream and the preferred embodiment is the 
technique currently used for including Closed Captioning and Teletext within 
standard television images. 

In terms of including this data in the VBI or MPEG2 stream the following 
calculations indicate the likely size of the data requirements. 

25 Assuming: 

the VBI specification allows for 32 Bytes/video line 
the maximum number of objects per image = 20 
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the maximum X,Y coordinates per object = 20 

that the Object #, Object depth, X, Y, and shape data each takes 1 Byte 

Then the bytes/object = 1+1+3(20) = 62 Bytes 
Hence for 20 objects VBI data = 20x62 = 1240 Bytes/frame 
5 It should be noted that this is worst case and in practice a typical scene 

requires 200 Bytes/frame. This value will decrease significantly with the 
application of suitable data compression and taking into account redundancy 
etc. 

In respect of including this information within an MPEG data stream, the 

1 0 MPEG standard allows for the delivery of a data stream to the receiving location 
with up to 64kbps capacity. Techniques to provide delivery of data within a 
MPEG stream may be used to deliver the depth map data to the receiving 
decoder. It is also possible to include this information in one of the sound 
channels of the MPEG signal. Where the MPEG signal is recorded on a 

1 5 medium such as CD-ROM or DVD then the information may be contained within 
a digital audio file, as a separate digital or analog file, or recorded on the disk in 
other means. Other techniques will be obvious to those skilled in the art. 

It is also possible to transmit the original depth map as part of the MPEG 
data stream. In a preferred embodiment the resolution of the depth map may be 

20 reduced from typically 640x480x8 pixels to 80x60x8 pixels before noticeable 
errors in the depth of objects in the resulting 3D images become apparent. This 
resolution is the same as the DCT block size in an MPEG encoded video signal. 
Hence the depth map information may be included in the MPEG signal by 
adding additional information to the DCT block that defines the depth of each 

25 block when converted into 3D. The depth map may also be included in the 
MPEG data stream as previously described eg audio channel, or other methods 
familiar to those skilled in the art. The reduced resolution depth map may also 
be compressed, prior to inclusion in the MPEG stream, using standard image 
compression techniques including, but not limited to, JPEG, MJPEG, MPEG 

30 etc. 
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In a further preferred embodiment the object outline is defined using 
bezier curves. Consider the outline of a person shown in figure 4. Bezier 
curves are applied to the outline which result in the x,y coordinates shown. The 
depth map for the object may therefore be defined as 

5 <object numberxobject depthxxl ,y1 ,x1 a t y1 a,x2b,y2b,x2,y2,.... x1 b,y1 b > 

Bezier curves may also be generated that require only 3 x,y coordinates 
as illustrated in figure 5 and may be defined as follows 

<object numberxobject depthxxl ,y1,x1a,y1a,x2,y2 x8a,y8a > 

This method is preferable since it requires a smaller number of elements 

1 0 to define the curve. 

It will also be appreciated that the outlines and/or depth maps created in 
either of these two methods, either compressed or uncompressed, may be 
stored in any suitable analogue or digital format and medium, either with or 
without their associated 2D images. The storage may include, but not limited to, 

15 floppy disk, hard disk, CD-ROM, laser disk, DVD, RAM, ROM, magnetic 
recording tape, video tape, video cassette etc. The stored outlines and/or 
depth maps may be recalled at a later time and/or place to enable the 
reconstruction of the depth maps for the generation of distortion meshes for the 
generation of 3D images or for further adjustment and fine tuning. 

20 Decoder 

Previously it has been disclosed that a distortion mesh may be used to 
convert a 2D image into 3D. 

It is now possible to generate the necessary distortion grid from a depth 
map. This depth map itself being generated from additional information 
25 transmitted within the 2D video. The generation of a distortion grid from a depth 
map may take place in realtime, semi-realtime or offline and may be undertaken 
locally or, via any suitable transmission medium, at a remote location. The 
generation may be implemented in software or hardware. 



12 

Thus rather than transmit the sub pixel points of the distortion mesh as 
part of the 2D image the information necessary to re-create the depth map may 
be transmitted. The depth map may then be reconstructed at the decoder and 
the conversion into a distortion grid undertaken. These conversions may be 
5 undertaken in either realtime, semi-realtime or offline at the receiving location 
and may be implemented in software or hardware. The preferred method of 
conversion from depth map data into depth map then to distortion grid is as 
shown in a software flow chart in figure 1 and in hardware in figure 2. The 
individual elements of the software conversion process function as follows: 
10 Image Sequence Source - 2D Film or Video or some other image sequence 
source. 

Area & Depth Source - This is the information that is sent with the Image 
Sequence and in the preferred embodiment is contained in the VBI or MPEG 
data stream. It contains information as to the position, shape and depth of each 
1 5 object. 

Apply Areas with Depths to Depth Map - To render an object the "area" 
within the object is filled/shaded according to the depth information. All area 
outside the shaded area is left untouched. This process results in the 
reconstruction of the original depth maps. 

20 Blur Depth Map - The hard depth map is then blurred (gaussian, fast or 

other) to remove any hard edges. The blurring provides a smooth transition 
between the objects in order to eliminate image overlapping. The blurring is 
slightly weighted in the horizontal direction. The vertical blur helps stop image 
tearing by bleeding into the images above and below thus giving a smoother 

25 transition between near and far objects. 

Process Image using Depth Map - The blurred depth map is then used as 
a source for displacement of the distortion grid, white being maximum 
displacement, black being no displacement. The amount of distortion along the 
horizontal axis is scaled according to the depth of the depth map at any given 

30 pixel location. In the preferred implementation the displacement for the left 
image is to the right, the right image displacement to the left. An overall forced 
parallax may be applied to the image so that the white/maximum displaced 
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objects are converged at screen level. The black (background) areas will then 
have a forced parallax equal to an unshifted image. The direction of 
displacement, and forced parallax, may be varied to suite the particular 
requirements of the 3D display system on which the converted images are to be 
5 displayed. 

Once the distortion grid has been generated, the conversion of the 2D 
image into 3D is undertaken as previously disclosed. 

A preferred embodiment of a hardware converter to generate separate 
left and right images from a distortion grid is shown in figure 2, which could be 
10 fully digital. A method of implementing this process is shown in figure 2a and 
operates as follows. 

The system uses two line stores, which are multi-ported to allow 
simultaneous access. A line of video is written into one of the line stores while 
the other line store is being read to generate the output video signal. At the end 
15 of the current line the line stores are swapped. 

The depth information is extracted from the video signal to regenerate the 
depth map for the current image. For each output pixel, the depth map is 
translated into a pixel offset (of the distortion grid). The pixel offset is added to 
the pixel counter as the video line is read out of the line store. The pixel offset is 
20 a fractional value, so it is necessary to read the pixel values each side of the 
desired pixel and interpolate the intermediate value. The odd/even field signal 
from the video decoder is used to control the field sequential video output and to 
synchronise the viewers shutter glasses to the output video signal. The basic 
circuitry may be duplicated to generate separate left and right video signals for 
25 3D displays that require this video format. 

A Functional block diagram of the DDC Decoder is shown in Figure 2b. 
The first process is to extract the object data from the incoming video which may 
be inserted in the VBI or MPEG data stream. The extracted data will be in 
compressed format and is subsequently decompressed using a microprocessor. 
30 The output from the microprocessor is the original object outline information 
and is again processed to produce the depth information for each object. This 
data is passed to a set of three rotating field buffers, the buffers being controlled 
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by a microprocessor. The first buffer recreates the original depth maps. The 
depth maps are then passed to the next buffer where the horizontal and vertical 
blurs are applied. Once the blurring has been applied the resulting data is 
applied to the final buffer where the data is passed to the Depth to Pixel Offset 
5 Converter shown in Figure 2a. Once the data has been transferred to the Offset 
Converter the final buffer is cleared and is ready to receive the next depth map. 

The DDC Decoder process is illustrated in Figure 2c. This shows the 
process as a timing diagram and assumes that current microprocessors are not 
sufficiently fast to undertake all the decoding processes simultaneously. The 

1 0 decoding process is therefore undertaken sequentially in a pipe-line process. 
As microprocessor performance improves it is expected that a number, if not all, 
of these processes will be undertaken simultaneously. In Figure 2c (1) four 
frames of video are shown, each frame comprising odd and even fields. At (2) 
the object list for frame four is generated whilst at (3) the depth map for frame 4 

15 is generated. At (4) the horizontal and vertical blurs are applied and at (5) the 
depth map for frame 4 is output and the buffer is cleared ready for the next 
object list. At (5) therefore the depth map for frame 4 and the 2D image are 
concurrently available to enable the conversion into 3D. It should be noted that 
Figure 2c illustrates the process for an individual frame and in practice, at any 

20 one time, depth maps for four different frames are being generated by different 
sections of the hardware. 

It will also be appreciated that it is not necessary to transmit the entire 
depth map to the receiver since the same depth maps will be reused when the 
same or a similar scene is displayed again. It is therefore desirable that the 

25 decoder retains in memory a sequence of previously transmitted depth maps for 
reuse rather than require to re-process a depth map that has been sent 
previously. Either the depth map or the resulting distortion mesh may be 
retained in the decoders memory which may be volatile or non-volatile and 
comprises, although not limited to, RAM, EEPROM, flash memory, magnetic or 

30 optical storage etc. It is also intended that generic depth maps and/or distortion 
grids be stored in the decoder. This will enable frequently occurring scenes to 
be converted without the need to transmit or convert the depth map. The correct 
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depth map may be selected by including data in the video signal that uniquely 
identifies to the decoder which default depth map to apply. It is also intended 
that the decoder should have the capability of receiving new or altered depth 
maps so as to enable a library of depth maps and/or distortion grids to be 
5 maintained within the decoder. This library may be held within, although not 
limited to, the following media RAM, EEPROM, flash memory, magnetic or 
optical storage etc. It is intended that the library be updated by the transmission 
of specific depth maps or distortion grids that are included in the video signal. It 
is also intended that the library could be maintained by means of external or 

1 0 internal plug-in modules containing such depth maps or distortion grids and by 
down loading to the decoder via the video signal, modem or the Internet. Other 
means of maintaining the library will be obvious to those skilled in the art. 

The general format of DDC Data included in the video signal may, in the 
preferred embodiment, include a header flag which indicates to the decoder 

1 5 the nature of the following data. A number of existing standards could be used 
for this format which in general will have the following format; 

<Flag#xdata to be acted upon by the decoder> 

examples of flags include, although not limited to, the following; 

Flag 1 - The following data is a depth map 

Flag 2 - The following data relates to the relocation of an existing object 
Flag 3 - The following date relates to the change in depth of an object 
Flag 4 -The following data relates to the reuse of a previously 
transmitted depth map 

Flag 5 - The following data relates to the use of a depth map within the 
library 

Flag 6 - The following data relates to the modification of a depth map 
within the library 

Flag 7 - The following data relates to the addition of a new depth map 
within the library 
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Flag 8 - The following data relates to the deletion of an existing library 
depth map 

Flag 9 - The following data relates to the use of motion parallax delays 
Flag 10 - The following data relates to the use of forced parallax 
5 Flag 11- The following data relates to the use of a mathematical 

algorithm 

Flag 12- The following data relates to the use of a mathematical 
algorithm library 

Alternatively the length of each data packet could be a different length 
10 which would uniquely define each packed and alleviate the need for a Flag. 

In the preceding description the same process could be applied to 
distortion grids. 

It is also intended that the decoder should be able to determine the most 
suitable depth map to apply to the associated 3D image by automatically 

15 making a selection from a nominated range within with library. For example the 
DDC data could direct the decoder to search the library of depth maps between 
specific index points or by generic category ie Evening News, Horse Race. The 
decoder would then select the appropriate map based upon object size, shape, 
speed, direction, colour, shading, obscuration etc. 

20 As a by product of the decoding process the original depth map, created 

during the encoding process , can be made available in a suitable format for use 
with 3D display systems that require a 2D image and object depth information. 
These displays may be autostereoscopic and/or volumetric in nature. 
Alternative approaches 

25 Alternatively, the mesh distortion process may be defined by a 

mathematical algorithm. This algorithm may be stored in the decoder and the 
DDC data then comprises the parameters to which the algorithm is applied. For 
example consider the general formula 
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f(x,y) = [1 - exp(- 1 ( Ixl - rx ). dx I ) ] . sin ( ((PI . x) / rx ) + PI/2 ) . 
[1 - exp(- 1 ( lyl - ry ). dy I ) ] . sin ( ((PI . y) / ry ) + PI/2 ) 
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where 

PI - constant 3.14159... 

I x I - absolute value of x 

rx - range of x , -rx <= x <= rx 

5 ry range of y , -ry <= y <= ry 

dx - clamping factor for x 

dy - damping factor for y 

If the following values are passed to the equation via the DDC data then 
the distortion grid in figure 6 is produced 

10 rx= ry = 50 

dx = dy = 0.1 

In terms of DDC data the following would be transmitted 

<Flag 11x50,50,0.1, 0.1 > 

Additionally these parameters may be stored in memory within the 
1 5 decoder in the form of a library and recalled by sending the library index within 
the DDC data. 

In terms of DDC data the following would be transmitted 

<Flag 12>< library index> 

The distortion mesh may also be obtained in realtime by the addition of a 
20 camera to an existing 2D video or film camera, which, using a variable focus 
lens and a sharpness detecting algorithm, determines the depth of objects in 
the image being viewed by the camera. Object depth may be obtained from a 
stereo pair of cameras whereby correlation between pixels in each image 
indicates object depth. The output from these configurations, before processing 
25 to provide distortion mesh data, may be used to generate depth maps. This is 
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achieved by processing the original 2D image and applying shading, or other 
indications, to indicate object depth as explained in this disclosure. The outline 
of each object may be obtained from object characteristics such as object size, 
colour, speed of motion, shading, texture, brightness, obscuration as well as 
5 differences between previous and current and future images. Neural networks 
and expert systems may also be used to assist with identifying objects. It is also 
proposed to shift the image within the camera so that a physical offset of 
subsequent images on the cameras image sensor are obtained. This shift may 
be produced optically, electro-optically, mechanically, electro-mechanically, 

10 electronically or other methods known to those skilled in the art. The shift may 
be in a single direction ie x or multiple directions either sequentially or 
randomly. The shift of objects on the cameras sensor will be greater for those 
objects that are closer to the camera. By correlating the pixels in successive 
images the depth of each object may be determined. Alternatively a plurality of 

1 5 cameras could be used. 

Other techniques may be used to determine the depth of objects within a 
scene. These include, but are not limited to, the use of range finders operating 
on optical, laser, ultrasonic or microwave principles or the projection of grids 
over objects within the scene and determining the depth of an object from the 

20 resulting distortion of the grids. 

A number of Computer Aided Drawing (CAD) software packages enable 
wire frame models of the images being drawn to be produced. These wire 
frame models, which are a projection of the facets of the object, can be used to 
determine the position of objects within a scene. 



Similarly, part of the rendering process of 3D non stereoscopic images 
from packages like 3D Studio allows the distance from the camera to each pixel 
to be output. This render can produce a gray scale image which has the closest 
object appearing white, and the furthest point from the camera appearing black. 
This gray scale map may be used as a compatible depth map for conversion 
into stereoscopic 3D. 
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